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Abstract 

We give an algorithm for learning symmetric fc-juntas (boolean functions of n boolean vari- 
ables which depend only on an unknown set of k of these variables) in the PAC model under 
the uniform distribution, which runs in time n°^ k ^ log k ^ . Our bound is obtained by proving the 
following result: Every symmetric boolean function on k variables, except for the parity and the 
constant functions, has a non-zero Fourier coefficient of order at least 1 and at most 0(k /log k). 
This improves the previously best known bound of 3k/ 31 ^T], and provides the first time 
algorithm for learning symmetric juntas. 

1 Introduction 

We consider a fundamental problem in computational learning theory: learning in the presence 
of irrelevant information. One formalization of the problem is as follows: We want to learn an 
unknown boolean function of n variables, which depends only on k <C n variables (typically k 
is O(logn)). We call such a function a A;-junta. We are provided with a set of labelled examples 
(x, f(x)), where the x's are picked uniformly and independently at random from the domain {0, l} n 
(this is the PAC model with uniform distribution). We wish to identify the k relevant variables 
and the truth table of the function. 

The problem was first posed by Blum 1 and Blum and Langley [I], and it is considered [21 113j 
to be one of the most important open problems in the theory of uniform distribution learning. 
It has connections with learning DNF formulas and decision trees of super-constant size, see [SJ 
El El El EES] for details. The general case is believed to be hard and has even been used to 
propose a cryptosystem [Hj- A trivial algorithm runs in time roughly n k by doing an exhaustive 
search over all possible sets of relevant variables. Two important classes of juntas are learnable 
in polynomial time: parity and monotone functions. Learning parity functions can be reduced to 
solving a system of linear equations over F2 . Monotone functions have non-zero singleton Fourier 
coefficients (e.g., see |13j). For the general case, the first significant breakthrough was given in ^3] 
- learning with confidence 1 — 6 in time n°' 7fc poly(2 fc , n, log 1/S). Note that we allow the running 
time to be polynomial in 2 k , since this is the size of the truth-table which is output. In the typical 
setting of k = O(logn), this becomes polynomial in n. 

In this paper we consider the class of symmetric A;-juntas, functions which are symmetric on 
their relevant variables. The only non-trivial algorithm known for this case is the standard Fourier 
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based algorithm, described in Section [21 The analysis of the running time of this algorithm reduces 
to the following question: 

What is the smallest t such that every symmetric boolean function on k variables, which 
is not a constant or a parity function, has a non-zero Fourier coefficient of order at least 
1 and at most i? 

A bound of to implies a running time of roughly n*°. A bound of ^ was provided in This 
was improved to || in Here we show a bound of 0(k/ log k) ( Theorem giving the first 
algorithm for learning symmetric A: -juntas in time . 

Techniques 

Our techniques involve a mix of number theory, combinatorics and probability. We start by reducing 
our problem to finding 0/1 solutions to a system of Diophantine equations involving binomial 
coefficients, as in We then take a departure from by further reducing this to the problem 
of showing that a certain integer-valued polynomial P is constant over the set {0,1, ...,k}. We 
manage to prove this in two steps: First, we show that P is constant over the union of two small 
intervals {0, U {k — t, k}. This is obtained by looking at P modulo carefully chosen prime 
numbers. To choose these prime numbers we use the Siegel-Walfisz theorem on the density of primes 
in arithmetic progressions with modulus of moderate growth. In the second step, we extend the 
constant nature of P to the whole interval {0, ...,k} by repeated applications of Lucas' Theorem. 
One additional interesting aspect of our proof is the use of an equivalence between a) the vanishing 
of Fourier coefficients and b) the equality of moments of certain random variables under the uniform 
measure on the hypercube and under the measure defined by the function itself. This equivalence 
helps us eliminate a lot of case analysis. 

2 Preliminaries 

Symmetric Juntas 

Given a boolean function / on n variables x±, x n , we will say that X{ is a relevant variable for / 
if there exist x,y £ {0, l} n which differ only in the i-th coordinate and f(x) ^ f(y). Variables that 
are not relevant are called irrelevant. We will call / a k -junta if / has at most k relevant variables. 

We consider the class of symmetric juntas. A boolean function / : {0, l} k — > {0, 1} on k 
variables is a symmetric function if for any permutation tt € S^, f(x\, ...,Xk) = /(-7r(xi), 7r(xfc)). 
Hence the value of / at (x%, ...,Xk) depends only on the weight of (x%, ...,Xk), which is the number 
of variables that are set to 1. A symmetric A: -junta is a function on n variables which is symmetric 
on the k variables it depends on. 

We will describe a symmetric boolean function on k variables by a (k + l)-bit string /o/i---/fe, 
where fi is the value of / on an input of weight i. The following four special symmetric functions 
on k variables will appear often: the two constant functions and 1, the parity function ©, and 
its complement ©. 

Learning in the PAC model 

We consider the PAC learning model ^IJ, in which we wish to learn a Concept Class C = [J n C n , 
where each C n is a collection of boolean functions from {0, l} n — > {0, 1}. In our case, C n is the class 
of symmetric fc-juntas on n variables. Let e be an accuracy parameter and 5 a confidence parameter. 
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A learning algorithm A for C has access to cin OTddc for f £ C n . A query to the oracle outputs a 
labeled example (x, f(x)), where x is drawn from {0, 1}™ according to some probability distribution 
T>. A is said to be a learning algorithm for the class C under the distribution T> if for all / € C, it 
outputs, with probability at least 1 — 5, a hypothesis /i such that Pr x [h(x) = f(x)] > 1 — e. We will 
be concerned only with the uniform distribution and we will obtain an algorithm with accuracy 
parameter e = 0, i.e., we identify the exact function /. 

Fourier Transform 

We will consider functions of the form: / : {0, 1}™ — > R. An orthonormal basis for the functions 
defined on the Boolean cube can be given by the characters of the group Z%- In particular, for 
every S C {1, ...,n}, define the following function: 

XS (x) = (-l)£<e* Xi . 

Any real-valued function on the Boolean cube can be expressed as a linear combination of the 
functions xs- Given /, we have that f(x) = f(S)xs(x), where f(S) is the Fourier coefficient 
of / at S and is equal to the inner product of / with xs'- 

f{S) = ^ £ f(x)xs(x). 

£€{0,1}™ 

Fourier-based Learning 

Let / be a fc-junta. It is known that we can exactly calculate the Fourier coefficients of / in the 
uniform distribution PAC model, with confidence 1 — 6 in time poly(2 k , n, log |), using standard 
Chernoff-Hoeffding bounds (see [101 I13| ^. Observe further, that if Xi is an irrelevant variable for a 
fe-junta /, then for any S C {xi, ...,x n } containing X{, f(S) = 0. Hence if f(S) ^ 0, for some S, 
then S contains only relevant variables. 

This suggests the following algorithm: Starting with 1 = 1, compute the Fourier coefficients of 
all subsets of {xi, x n } of size /. Collect the union of all relevant variables that correspond to 
subsets with non-zero Fourier coefficients. Stop as soon as you collect all k relevant variables. 

Since the function is symmetric, for any two sets S,T of relevant variables such that \S\ = \T\, 
we have f(S) = f(T). Hence the first time that we will identify some relevant variables in the 
algorithm, we will actually be able to identify all the relevant variables. Once we find the relevant 
variables, finding the truth-table of the function can be done in time poly(2 k , log 4). 

The above algorithm would take time roughly n k for / £ {0, 1, ©, ©}. However, these particular 
functions are well known to be learnable in time poly(n, log ). Hence the following is true: 

Fact 2.1. If every symmetric function f {0, 1,©,©} has a non-zero Fourier coefficient of order 
between 1 and t, then we can learn symmetric k-juntas in time n l poly(2 k , n, log -r). 

3 Main Section 

3.1 An Equivalent Formulation 

We state an equivalent condition for the existence of a non-zero Fourier coefficient of a boolean 
function /, as proved in Let / : {0, l} k — > {0,1} be a boolean function. For a vector 
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x = (xi, . . . , Xk), and a set S C [A;], let xg be the projection of x on the indices of S. Let 
a € {0, 1}I 5 L Define the following probabilities: 

PsAf) :=Pr[/(x) = l | x 5 = a] 

Unless mentioned, all probabilities are over the uniform distribution on {0, l} fc . For t > 1, call 
a boolean function / on k variables t-null, if for all sets SC [k], with |5| = t, and for all a G {0, 1}*, 
the probabilities Ps,a{f) are all equal to each other. The following lemma reveals the connection 
with the Fourier coefficients of /. 

Lemma 3.1. Let f be a boolean function on k variables. Then f is t-null for some 1 < t < k, 
if and only if, for all ^ S C [A;] with cardinality at most t, f{S) = 0. 

It is clear that if s < t and / is t-null then it is also s-null. 

When we consider the case of symmetric functions, ps,a(f) just depends on t := \S\ and the 
weight w of a. We denote this by Pt,w{f)- It is clear that: 

1 ^ ( k t\ 

where ( m ) is if m < or m > I, and (q) is 1. It follows that / is t-null if for < w < t, 
Pt,w{f) are all equal. It is easy to see that the constant boolean functions {0, 1} are t-null for all 
t with 1 < t < k. The parity functions {©,©} are also t-null for all t satisfying 1 < t < k. From 
Lemma 13. II and Equation [l\we get: 

Corollary 3.2. All symmetric boolean functions f {0,1,©,©} have a non-zero Fourier coeffi- 
cient of order at most to (and at least 1) iff {0, 1, ffi, ©} are the only solutions to 

;*)- if *(*:?) --£*(!-"£) 

In the next section, we show that this is true for to < Ck/logk for large enough k. 




3.2 A bound of 0(k /log k). 
The following is our main theorem. 

Theorem 3.3. There is an absolute constant C > such that for large k, every symmetric boolean 
function f on k bits with f {0,1,©,©} has a non-zero Fourier coefficient of order at most 
Ck/logk and at least 1. 

The rest of this section is devoted to proving Theorem 13.31 Suppose / is a boolean function on 
G = Zjj, such that all its Fourier coefficients of order up to k — N are 0. Then the values fj of / 
satisfy (J2J) with to = k — N, which, changing parameters, can be rewritten as: 

Y.( N )f»+3 =CN > foralli/ = 0,...,fc-iV. (3) 

We want to show that if k — N > Ck/logk, for some appropriately large constant C > 0, then fj 
is either constant or alternates between and 1. We prove this for all k sufficiently large. 
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Define Xj = fj+\ — fj, for j = 0, . . . , k — 1, and observe that the sequence Xj satisfies the 
homogeneous version of ©: 

( ■ ) x u+j = 0, for ah v = 0, . . . , jfe - JV - 1. (4) 

Remark. In Q the number N can be replaced by any other integer N\ in the interval [JV, k]. This 
follows since all the non-constant Fourier coefficients up to order k — N are 0. 

From @ the sequence Xj may be defined for all j € Z and £ Z for all j. From the theory 
of recurrence relations we know then that the sequence Xj may be written as a linear combination 
of the following sequences: 



z) = ^2j {j) z ^ = (1 + z) N . Therefore there is a polynomial P(x), of degree at most JV — 1, such 



The reason for this is that —1 is the only root of the characteristic polynomial of the recurrence, 

, (N\ — f-\ i 

that 

^■ = (-iyP(j), foralljeZ. 

Clearly P(x) takes integer values on integers and in particular P(j) G {—1,0, 1} for j = 0, . . . , k— 1. 
From the well known characterization of integer- valued polynomials it follows that we may write 

JV-l , . 

P( x ) = ^ dj ( X j , with oj G Z. (5) 



j'=o 



If p > JV is a prime, and since all the factors that appear in denominators in Q are strictly less 
than p (hence invertible mod p), it follows that the sequence P(j) mod p, j G Z, may be viewed as 
a polynomial with coefficients in Z p and therefore is a p-periodic sequence mod p, i.e. 

P(J + p) = P(j) mod p, for all j G Z and p > N. (6) 

If, in addition, < j < j + p < /c, when all P-values that appear in © are in {—1,0, 1}, it follows 
that we have the non-modular equality 

P(j+p) = P(j), (N <p<j+p<k). (7) 

We want to show that / G {0, 1, ©, ©}. Since Xj = fj + \ — fj it is enough to show that either Xj 
is identically or that Xj = (— 1) J or = (— . This is equivalent to showing that P is a 
constant polynomial, constantly equal to —1,0 or 1. 

Notation. 

1. In what follows we repeatedly use the letter C to denote a positive constant which depends on 
no parameter (unless we say otherwise). As is customary, this constant C need not be the same in 
all its occurences. 

2. We define e by the relation k — N = ek and assume e > C /log k, with C a large enough positive 
constant. 

We shall need various primes in intervals from now on. The version of the prime number theorem 
that we will be using is the Siegel-Walfisz theorem (see [01 Theorem 2]). Define the logarithmic 
integral 

[ x dt x 

L\x = \ ~- , [x— ► oo). 

J 2 log t log x 
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The Euler function <p(q) denotes the number of moduli mod q which are coprime to q. 

Theorem A (Siegel-Walflsz) Let ir(x;M, a) be the number of primes < x which are equal to 
a mod M and assume that (M, a) = 1. Then if M < (log x) A , A a constant, we have 

Lix 



ir(x;M,a) = + 0(xexp(-c v / logx), (as x -> oo). (8) 

cp{M) 

where c depends on A only (the constant in the O(-) term is absolute). 

For 7r(x), the number of primes up to x without any restriction, the prime number theorem 
says 7r(x) = Li (x) + O (x exp ( — c\/log x) , for some constant c. 

These theorems guarantee that, for x — > oo, the interval [x,x + A] has the "expected" number 
of primes whenever A > Cx j '{\ogx) A , whatever the constant A, even if we impose the condition 
that these primes are equal to a mod M, as long as M < (logx) B , for any constant B. 

We use the above theorems along with the p-periodicity of P to deduce that P is in fact 2- 
periodic on the union of 2 small sub-intervals of [0, k — 1] . 

Lemma 3.4. The polynomial P satisfies the 2-periodicity condition 

P(j)=P(j + 2), 
whenever j, j + 2 E A = [0, k - N] U [N, k-1}. 

Proof. Assume q < r are two primes in [N, N + h], where h = (k — N)/3 = (The length of 
the interval [N, N + h] is large enough for the prime number theorem to guarantee the existence of 
many primes in it.) From ((7J) it follows that the finite sequences 

P(0),...,P(k-q) and P(q), . . . , P(k) 

are identical. Applying again with r we get that the finite sequences 

P(0),...,P(k-r) and P(r), . . . ,P(k) 

are identical. It follows that 

P(j + r -q) = P{j), if N + h < j < N + 2h and r > q primes in [N, N + h}. (9) 

We now assume that the difference M = r — q is the smallest difference between two primes in 
[N,N + h\. By the prime number theorem M < Clogk. Hence, we can apply Theorem [3 Since 
(p(M) < M < Clogk in that case Theorem lAl guarantees that the number of primes equal to 
a mod M in [N, N + h] is at least 



log 2 k log 3 k ' 

whenever (M, a) = 1. All that matters here is that this number is positive. 

Let t E [N, N+h] be the smallest prime which is equal to —1 mod M. By Theorem lA"| applied to 
M and —1, its existence is guaranteed and furthermore that t ~ N. The same theorem guarantees 
that we can find a prime s E (t, N + h] such that s = 1 mod M. Then s — t = 2 mod M or 
s — t = £M + 2, for some nonnegative integer £. Therefore, for N + h < j < N + 2h we have 

P(j) = P(j + s — t) (applying © for the primes s, t) 

= P(j + £M + 2) 

= P(j + (£ — 1)M + 2) (applying Q for the primes r, q) 

= PU + 2). 
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(10) 



□ 



Notice that in the sequence Xj, if one erases the 0's then one sees an alternation of —1 and 1 
(this follows from the fact that fj G {0, 1}). This property greatly reduces the number of allowed 
patterns in Xj and in fact it implies that P is constant in A. 



Proof. Prom Lemma 13.41 the values of P in [N, k — 1] must be a 2-periodic sequence. The only 
essentially different non-constant 2-periodic patterns for the values of P in [N, k — 1] are 010101 . . . 
and (— 1)1(— 1)1... and they both violate the property that Xj = (—l) J P(j) must satisfy, namely 
that if one erases the 0's then one must see an alternation of 1 and —1. Therefore P is constant in 
each of the two intervals of A. From the p-periodicity Q it follows that the constant is the same 
in both intervals. □ 

We now extend the set on which P is constant to a superset of A that contains a small interval 
around k/2. We will make use of the following theorem which follows from Lucas' Theorem jHJ Ch. 



then (X r ) = (7) mod r. 

Lemma 3.7. Let a = (1/2 - e/2)k and b = (1/2 + e/2)k. Then P(l) = P(0) fora<l<b. 



Proof. We shall apply Theorem 13.61 with m = 2 and with a prime r such that 2r — N takes the 
minimal possible nonnegative value. It follows from the prime number theorem that 2r — N = o(ek). 
And it follows from the remark after Q that 



Taking residues mod r and using Theorem 13.61 for m = 2 we obtain 

P{y) - 2P{y + r) + P(v + 2r) = mod r, (y G Z). 

By our particular choice of r we have -P(z^) = P{y + 2r) = P(0) whenever v G [0, k — N — o(ek)]. It 
follows that P{v + r) = P(0). Applying this for all v G [0, k - N - o(ek)] we get P(l) = P(0) for 
all I in the interval (a + o(ek), b — o(ek)). To get rid of the o(ek) terms in the interval above, just 
choose a slightly larger r and apply again for all v G [0, k — N — o(ek)]. □ 

So far we have proved P(l) = P(0) on the set 



which consists of three equispaced intervals of roughly equal size ek. We consider 2 cases for P. 
The first is when P is on A2 and the second is when P is 1 or —1. 

In the case that P is on A2, we shall need the following theorem, which already gives a lot of 
significant information about the function /. It should be thought of as analogous to the fact that 
the moments of a (vector) random variable can be read off the Fourier Transform of its distribution 
(the characteristic function) by looking at derivatives at 0. 



Lemma 3.5. The polynomial P is constant in A (defined in Lemma \3.4\ l- 





A 2 = [0, k - N] U [a, b] U [N, k 



1] 
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Theorem 3.8. Suppose f : G = Z,^ = {0, l} k — > R is nonnegative (and not identically 0) and has 
all its Fourier coefficients of order at most r (and at least 1) equal to 0. Let fi denote the uniform 
probability measure on the cube G and v denote the probability measure on G defined by 

Let also X±, . . . , Xk denote the coordinate functions on G, which we view as random variables. 
Then for all i\ < %i < ■ ■ ■ < i s , < s < r, we have 

(Xjj • • • Xi s ) = (X^ • • • Xi s ) . 

Proof. Let F = YlxeG fix). We assume for simplicity that i\ = l,...,i s = s. Then, writing 
x = [x\,X2, . . . , Xk) and [s] = {1, . . . , s}, we have 

E U (X 1 ---X S ) = I]T/(x)xi---x s 

1 \- . + 1 + (-1)^+1 

- f 2^JW 2 "' 2 

\G\_ 
2 S F 

\G\_ 
2 S F 

\G\ 



2 s F 

x£G SC[s] 
SC\s] 1 1 x£G 



.> F EH) |s| />) 

SQs] 



/(0) (by the vanishing of /) 



2 S F 
E^(X 1 ---X, 



□ 



Remarks. 

1. For functions / : {0, l} k — > {0,1}, the above theorem follows directly from the definition of 
t- nullity in Section f.S.ll However, as we shall see in the proof of Lemma l.S.lOl we need to apply this 
theorem for functions whose range is not {0, 1}. 

2. If the nonnegative function / is symmetric then the identity of moments up to order r with those 
of the uniform distribution (r-wise independence) and the vanishing of the non-constant Fourier 
coefficiens of weight up to r are equivalent. This can be proved by induction on r. We do not use 
this here. 

Corollary 3.9. Under the assumptions and definitions of Theorem \3. 81 the random variable S = 
X\ + • • • + Xk has the same power moments under the probability measures \x and v, up to order r. 

Proof. The power S s , s < r, can be written as a sum of terms of the type X^ ■ ■ ■ Xi t , for t < s. 
One uses the fact that X? = Xj. □ 

Lemma 3.10. If P is on A 2 , then f £ {0, 1}. 
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Proof. Suppose the polynomial P is constantly equal to on the set A2 and that / {0, 1}. The 
sequence fj is constant in each of the three intervals of A2- By possibly considering 1 — / (whose 
Fourier coefficients vanish exactly where those of / do), we may assume that fj = on the middle 
interval (o, b). Define the nonnegative function g : G — > R by 

g(xi, . . . ,x k ) = f(xi, . . . ,x k ) + /(l - xi, . . . , 1 - x k ), 

and observe that the Fourier coefficients of g of weight at most k — N vanish. Let r be the 
distribution of the random variable S = X\ + • • • + X k under the measure induced by g on G (each 
vertex x S G has probability proportional to g(x)). Note that this is a well defined probability 
distribution since we assumed that / and 1 — / are not the function. Clearly r is symmetric 
about k/2 and has no mass in (a, b), since both f(x%, . . . , x k ) and f(l — xi, . . . , 1 — x k ) are when 

x\ H \-x k G (a,b). The s-th moment with respect to the measure r of the variable S in Corollary 

13.91 is the expression 

where again F = Y^j9j{*j)- By Corollary 13.91 this must equal the s-th moment with respect to the 
binomial measure fi, which is the quantity 

But the variance of S under \i is 

M(p,2) - M{n,l) 2 = k, 

since under /i the random variables X\, . . . , X k are independent, while the variance of S under r is 

M(r,2) -M(r,l) 2 > Ce 2 k 2 , 

as half the mass of r sits to the left of ^-^k and half to the right of ^-^k. These orders of magnitude 
are different whenever e > C /yk, which is true in our case as e > C /log k. This contradiction 
proves that P cannot equal on D 

Extending A2 to [0, k — 1]. 

The rest of the proof goes as follows. By Lemma 13.101 we may assume that P(l) = 1 or — 1 for 
I £ A2- Without loss of generality, assume P is 1 on A2- We apply Theorem 13 . 61 for m = 4, 8, 16, . . . 
successively and each time we choose a prime r such that mr — N is minimized. Theorem 13.61 gives 
for all v £ Z 

^\P{v + 2r) h P(v + mr) = mod r. (11) 

When v € [0, k — N] the numbers v + Ir for even I in (|11|) are in the set »4 m /2 and therefore the 
corresponding P values are all 1, by induction on m. In order to deduce that (|llj) holds as an 
identity of integers (not residue classes) it is enough to guarantee that the sum of the absolute 
values of all terms is less than r. This amounts to the inequality 2 m < r. Given that mr ~ k this 
is true if we can guarantee that 

m < a logfc, (12) 
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for some small enough constant c±. Therefore, as long as m satisfies the bound (|12|). we have that, 
for v E [0,k-N], 

(7Tl\ 
2 jP(i/ + 2r) hP(^ + mr) = 0. (13) 

Since the total weights of the positive and negative terms in (j!3[) are the same, it follows that the 
P{y + Ir) terms corresponding to odd I are also 1. 

Each time we perform this operation we deduce that P is 1 on a collection of intervals Am 
which consists of A rn /2 and one interval of length ek in the middle of the gap between any two 
succesive intervals of A m /2- So A m has m + 1 disjoint equispaced intervals of length ek. We apply 
this operation until we have em ~ 1, which implies that we have covered the whole interval [0, k— 1] 
with our set A m . We need to make sure that (|12|) still holds then. Since em ~ 1 this is achieved by 
setting e = C/logk, for a large enough constant C. At the end of this process, there could still be 
some very small possibly uncovered intervals of size o(ek). However since we have already shown 
that P{1) = 1 on a set of k — o{ek) entries, we can use the fact that P has degree at most N — 1 to 
obtain that P{1) = 1 on the whole interval [0, k — 1]. 

This concludes the proof of the Theorem 13.31 which implies: 

Corollary 3.11. The class of symmetric k-juntas can be learned exactly under the uniform distri- 
bution with confidence 1 — 5 in time n°( fc//logfc ) • poly(2 fc , n, log(l/<5)). 

4 Discussion 

The main open question is to obtain tight upper and lower bounds on the running time of the 
Fourier-based algorithm for symmetric juntas. It may even be that for large k, every symmetric 
function has a non-zero Fourier coefficient of constant order. 

It should also be noted that in the case of balanced symmetric functions, i.e., symmetric func- 
tions with Pr[f(x) = 1] = 1/2, a bound of 0(/c 0,548 ) follows from |17j (see JH])- Hence to improve 
our result, one may focus on finding new techniques for unbalanced functions. 
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